-
Notifications
You must be signed in to change notification settings - Fork 365
Optionally disallow the use of 'root' user for Processes and Tasks of docker lifecycle Apps #4452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optionally disallow the use of 'root' user for Processes and Tasks of docker lifecycle Apps #4452
Conversation
Edit: I still had to click sign again. But it seems to be working now. |
7c07dd0
to
c5fcbde
Compare
@acosta11 you will also need to open a PR to add this new property to https://github.com/cloudfoundry/capi-release |
Draft PR: cloudfoundry/capi-release#561 Still need to test this end to end with the bosh release config. |
I haven't validated this myself yet, but I don't think this PR covers the case where the |
Indeed it's not exercised through that complete flow. I can look at adding a test to explicitly configure the user to root in the docker metadata. I think the setup that I copied was only setting image information so would have to update the call to the factory. |
There is some complexity there, since the user specified in the Dockerfile ->
I think we need to check the user-that-will-be-used when actually starting to run the process/task (but, I'm open to other ideas). I'm fine if you want to break that into a separate PR, to keep things more manageable with this one. |
c5fcbde
to
ce9433b
Compare
Initial pass on the runtime check for 'root' user leverages the desired_lrp and task_action builder layer to avoid disrupting staging. Talking with Greg, I also considered going into all the relevant action implementations to start the error message a bit earlier in the call chain. That said, the builder level check ended up fairly succinct and easy to test, so I'm slightly favoring this implementation to avoid missing something in the action layer with a larger set of required changes. Curious if it looks reasonable to y'all. |
d7fd389
to
a3f85ca
Compare
Wrapped up end to end testing, added some error handling from an issue detected as part of that process, and rebased on latest main with the task handler and ccng config updates. Should be good for review now. See updated PR description for nsync edge case that I also validated in the e2e env. If anything, taking one last look at our handling of the |
a3f85ca
to
f31d46d
Compare
ef38937
to
824088f
Compare
spec/unit/lib/cloud_controller/diego/docker/lifecycle_protocol_spec.rb
Outdated
Show resolved
Hide resolved
824088f
to
188d860
Compare
Rebased on main again and ultimately removed all of the model layer changes in favor of the user allow list. Added more explicit tests at the diego builder layer for the cases where user is '0' or absent in the Dockerfile to round out the ways by which a root user may be specified. |
188d860
to
5c86d43
Compare
* Ignore user in droplet docker execution metadata via dockerfile for staging because user may be subsequently overridden on Process or Task model * Enforce that 'root' or '0' user is not used at runtime by task_action and desired_lrp builders * Re-raise the error in desire app handler. If not re-raised, error is supressed with no clear user feedback.
5c86d43
to
ff95407
Compare
Also added dev note to PR description about potentially unexpected behavior of the Process/Task policy not applying to the users set in Dockerfile specifically. Running CATs on env now to see if anything weird pops up. |
CATs appears to be working consistently for the standard app and docker suites. I'm mostly seeing what looks like flakes with logging tests. $ cat cats-config.json
{
"api": "<api>",
"apps_domain": "<apps>",
"admin_user": "admin",
"admin_password": "<cred>",
"skip_ssl_validation": true,
"use_http": true,
"include_apps": true,
"include_capi_experimental":true,
"include_detect": false,
"include_security_groups": false,
"include_services": false,
"include_v3": true,
"include_tasks": true,
"include_backend_compatibility": false,
"include_capi_no_bridge": false,
"include_container_networking": false,
"include_credhub" : false,
"include_docker": true,
"include_internet_dependent": false,
"include_isolation_segments": false,
"include_persistent_app": false,
"include_private_docker_registry": false,
"include_privileged_container_support": false,
"include_route_services": false,
"include_routing": false,
"include_routing_isolation_segments": false,
"include_service_discovery": false,
"include_service_instance_sharing": false,
"include_ssh": false,
"include_sso": false,
"include_zipkin": false,
"java_buildpack_name": "java_buildpack_offline"
}
CONFIG=cats-config.json ./bin/test --procs=4
...
Summarizing 2 Failures:
[FAIL] [app_syslog_tcp] Syslog Drain over TCP Syslog drains [It] forwards app messages to registered syslog drains via mtls
/workspace/cf-acceptance-tests/app_syslog_tcp/syslog_drain.go:152
[FAIL] [app_syslog_tcp] Syslog Drain over TCP Syslog drains [It] forwards app messages to registered syslog drains
/workspace/cf-acceptance-tests/app_syslog_tcp/syslog_drain.go:113
Ran 80 of 275 Specs in 1513.412 seconds
FAIL! -- 78 Passed | 2 Failed | 2 Pending | 193 Skipped
Ginkgo ran 1 suite in 25m15.173901372s Seems like I just need what looks like some flakey errors in the remaining mysql unit test here to pass. |
Proposed Change
Add a configuration option to allow or disallow the use of the 'root' and '0' user for processes and tasks of docker lifecycle apps.
Use 'vcap' as the default user in the case that the 'root' user is disallowed.Edit: opting to leave the user as is given docker images need to add the desired user anyway, which makes a non-root default less clear.Use Case
Operators can secure their runtime infrastructure by preventing the possibility of cascading user privilege escalation to the root user of the underlying os namespace construct realizing the desired Process.
Dev Notes
Given the scope of docker lifecycle apps, it seems like the validation logic for this feature has to live in the policy layer. If implemented in the message layer, I think it may have the unintended effect of allowing the root user in the net new context of buildpack lifecycle apps when configured to match the current default of allowed for docker apps (assuming the flag exists at this level of coarseness).Edit: the user allow list functionality in the policy layer works to this end, but we still need to ensure docker apps that set the user in the Dockerfile are appropriately handled.Note this implementation specifically only checks for the 'root' or '0' user, so apps that appropriately set non-root users in the Dockerfile do not have to go through the same additional allowed users list validation as the Process and Task model. This behavior is technically inconsistent with other lifecycles and could be confusing for some users, but it does offer the advantage of more compatibility with existing docker apps. And technically the behavior is consistent with the spirit of preventing root user privilege escalation.
Confirmed in manual end to end testing that disallowing 'root' user on the cloud_controller_clock will cause an error in the nsync process for apps with the root user that were previously valid. Triggered a new sync by manually deleting the lrp on the diego cell. The error does not get surfaced to the api/cli. Instead, a user would have to go into the bosh logs and see the attempted sync on the scheduler job. Note that allowing the 'root' user on the clock job avoids this error, but still prevents new process/task updates from using the 'root' user, so it seems like a reasonable option for backwards compatibility.
Example error:
Checklist
I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the
main
branchI have run all the unit tests using
bundle exec rake
I have run CF Acceptance Tests